Multi-frame GMM-based block quantisation of line spectral frequencies
نویسندگان
چکیده
In this paper, we investigate the use of the Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. As a viable alternative to vector quantisers, the GMM-based block quantiser encompasses both low computational and memory requirements as well as bitrate scalability. Jointly quantising multiple frames allows the exploitation of correlation across successive frames which leads to more efficient block quantisation. The efficiency gained from joint quantisation permits the use of the mean squared error distortion criterion for cluster quantiser selection, rather than the computationally expensive spectral distortion. The distortion performance gains come at the cost of an increase in computational complexity and memory. Experiments on narrowband speech from the TIMIT database demonstrate that the multi-frame GMM-based block quantiser can achieve a spectral distortion of 1 dB at 22 bits/frame, or 21 bits/frame with some added complexity. 2005 Published by Elsevier B.V.
منابع مشابه
A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding
In this paper, we provide a review of LPC parameter quantisation for wideband speech coding as well as evaluate our contributions, namely the switched split vector quantiser (SSVQ) and multi-frame GMM-based block quantiser. We also compare the performance of various quantisation schemes on the two popular LPC parameter representations: line spectral frequencies (LSFs) and immittance spectral pa...
متن کاملScalable distributed speech recognition using Gaussian mixture model-based block quantisation
In this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen–Loève transform of the block quantiser. Comp...
متن کاملImproved noise-robustness in distributed speech recognition via perceptually-weighted vector quantisation of filterbank energies
In this paper, we examine a coding scheme for quantising feature vectors in a distributed speech recognition environment that is more robust to noise. It consists of a vector quantiser that operates on the logarithmic filterbank energies (LFBEs). Through the use of a perceptually-weighted Euclidean distance measure, which emphasises the LFBEs that represent the spectral peaks, the vector quanti...
متن کاملGaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec
In this paper, we investigate the use of a Gaussian Mixture Model (GMM)-based quantizer for quantization of the Line Spectral Frequencies (LSFs) in the Adaptive Multi-Rate (AMR) speech codec. We estimate the parametric GMM model of the probability density function (pdf) for the prediction error (residual) of mean-removed LSF parameters that are used in the AMR codec for speech spectral envelope...
متن کاملSwitched split vector quantisation of line spectral frequencies for wideband speech coding
In this paper, we investigate the use of the switched split vector quantiser (SSVQ) for coding short-term spectral envelope information for wideband speech coding. The SSVQ is the hybrid of a switch vector quantiser and split vector quantiser, which has been shown in previous studies to be more efficient, in terms of rate-distortion, as well as possessing low computational complexity, than the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 47 شماره
صفحات -
تاریخ انتشار 2005